The scatter plots use the plotly package, which
converts regular images into interactive web graphics via the open
source JavaScript graphing library plotly.js.
Hovering over a plotly figure opens the Chart Studio Modebar on the top-right. It includes: (a) Download Plot as a PNG; (b) Zoom and Pan Buttons; (c) Zoom In/Out; (d) Autoscale and Reset Axes; and (e) Hover Options.
Interacting With Data Points and Legend Groupings
| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
| Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
| Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
| Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
| Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |
| Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |
| Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
| Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 |
| Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |
| Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
| Lincoln Continental | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
| Chrysler Imperial | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
| Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |
| Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 |
| Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 |
| Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |
| Dodge Challenger | 15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 |
| AMC Javelin | 15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 |
| Camaro Z28 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
| Pontiac Firebird | 19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |
| Fiat X1-9 | 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 |
| Porsche 914-2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 |
| Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 |
| Ford Pantera L | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 |
| Ferrari Dino | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 |
| Maserati Bora | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 |
| Volvo 142E | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 |
* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.
All tables use the DT package, which converts regular dataframes into interactive datatable HTML table widgets via the open source JavaScript library DataTables.
Here are some tips to make the most of these tables.
The dashboard includes information about the articles (e.g., title,
abstract) as well as on the authors, such as university of affiliation.
I have obtained these data from PubMed using the PubMed API through the
easyPubMed package. I have determined the country of the
first author of each paper based on the affiliation address by matching
the university name with a world university names database obtained from
GitHub.
The full current list of journals queried can be obtained through the
following R code
pubmedDashboard::journal_field$journal_short (note that
PLOS One was only included until 2011 because its number of papers then
becomes too large to handle for this dashboard):
On the right, we can see the list of journals included in this dashboard following the above query. The table also includes the number of publication for each journal; this is our sample size of sort. These data helps us understand in part the quality of the PubMed data, to the extent that some journals have poor PubMed indexation or no indexation at all. For example, the journals below had no PubMed matches at all:
This dashboard should be considered with the following limitations in mind:
1. The dashboard data are not per capita (yet)
This could bias our estimations since we do not look at relative ratios.
2. The way in which we define which journals represent behavioural science as a whole may be problematic
Authors from non-English-speaking countries (including in the Global South, and especially Latin America) are more likely to publish in languages other than English (like Spanish or French journals), which are less well-known. Saying that only English-speaking journals truly represent “Behavioural Science” as a whole is somewhat problematic. Would that mean that all the Spanish and French behavioural science journals do not they represent behavioural science as well, even though English monolinguals cannot read them?
3. Some lesser-known journals are not indexed on PubMed
Smaller local journals that are more likely to include authors from the Global South (e.g., some African journals) are sometimes not indexed on PubMed. This can can contribute to misrepresenting the proportion of first-authors by region. At the same time, journal visibility does matter on the global stage.
4. The dashboard has lots of missing data, which is probably not random
Universities from the Global South and non-English speaking countries are less likely to be correctly detected, for example because of special characters in names or less well-known universities not included in our database. As a result, the country for these publications is more likely to be marked as missing and therefore not be included in the dashboard. This can further under-represent the Global South in our data.
More specifically, some of the papers were missing address information; in many cases, the PubMed API provided only the department and no university. It was not possible to identify the country in these cases (one would need to look at the actual papers one by one to make manual corrections). Furthermore, some university names from the data did not match the university name database obtained from GitHub. In some cases, I have brought manual corrections to university names in an attempt to reduce the number of missing values. A table of data with missing countries is accessible at the Missing Data tab.
Possible future steps include: (a) obtaining a better, more current university name database (that includes country of university), (b) making manual corrections for other research institutes not included in the university database, (c) host DT tables on a server to speed up the website and allow the inclusion of a DT table for exploring the raw data, (d) find a way to use country flags for the countries-by-journal figure, and (e) use per-capita data to make it more representative.
---
title: '<a href="#landing-page" style="color: white">The Missing Majority Dashboard</a>'
author: '<a href="https://remi-theriault.com" style="color: white">Rémi Thériault</a>'
output:
flexdashboard::flex_dashboard:
orientation: rows
vertical_layout: scroll
social: menu
source_code: embed
# theme:
# version: 3
# primary: "#61ADCB"
# bootswatch: minty
# runtime: shinyrmd
#lumen
storyboard: false
favicon: logo.ico
css: style.css
---
<script>
document.querySelector(".navbar-header").innerHTML =
"<a href=\"#home\" class=\"navbar-brand navbar-inverse\">The Missing Majority Dashboard</a>";
</script>
# Home {.hidden}
```{r setup, include=FALSE}
query_pubmed <- FALSE
options(scipen = 999)
```
```{r packages}
# Load packages
library(pubDashboard)
library(dplyr)
library(ggflags)
read_data <- FALSE
```
```{r get_historic_data, eval=read_data}
data <- read_bind_all_data()
# We filter for year 1987 because there are almost no publications before that
data <- data %>%
filter(year >= 1987)
```
```{r exclude_other_journals, eval=read_data}
stats_continent <- table_continent(data, datatable = FALSE)
stats_country <- table_country(data, datatable = FALSE)
```
## Row 1 {data-height=400}
### {data-width=1160}
[The Global South continues to be underrepresented in behavioural science]{.big_center}
### **First authors in behavioural science from...**
<!-- #### First authors in behavioural science from... -->
```{r}
rate <- 1
flexdashboard::gauge(
rate,
min = 0,
max = 100,
symbol = '%',
flexdashboard::gaugeSectors(
danger = c(0, 40),
warning = c(40, 79),
success = c(80, 100)),
abbreviateDecimals = 1,
href = "#continent",
label = "Latin America & Africa"
)
```
<!-- #### From Latin America and Africa -->
```{r, eval=T}
rate <- 55
flexdashboard::gauge(
rate,
min = 0,
max = 100,
symbol = '%',
flexdashboard::gaugeSectors(
success = c(0, 40),
warning = c(40, 79),
danger = c(80, 100)),
abbreviateDecimals = 1,
href = "#country",
label = "USA"
)
```
<!-- #### From the US -->
```{r, eval=T}
rate <- 85
flexdashboard::gauge(
rate,
min = 0,
max = 100,
symbol = '%',
flexdashboard::gaugeSectors(
danger = c(0, 40),
warning = c(40, 79),
success = c(80, 100)),
abbreviateDecimals = 1,
href = "#continent",
label = "North America & Europe"
)
```
<!-- #### From North America and Europe -->
## Row 2
### **Representativeness of First Authors in Behavioural Science**
The majority of the world’s population comes from outside the Global North. Yet this majority only represents a small fraction of first authors in behavioural science research, who tend to be located in North America or Europe, and are mostly based in the US ([Thalmayer et al., 2021](https://psycnet.apa.org/doi/10.1037/amp0000622), [Arnett, 2008](https://doi.org/10.1037/0003-066x.63.7.602)). Thus, most of the world, and especially Africa, Latin America, and Asia, are underrepresented, which could affect the validity and generalizability of psychological research. This dashboard presents some aggregated data by continent, country, year, and journal (for first authors only), to better document this trend over time and, possibly, inform future public policy on the matter. *Note:* This dashboard is a work in progress (version Alpha).
If this matters to you, please [reach out](https://www.busara.global/partner-with-us/).
---
**How to cite this dashboard?**
Thériault, R., & Forscher, P. (2024). *The Missing Majority Dashboard*. https://remi-theriault.com/dashboards/busara
### **Choice of Journals**
The data from this report originally included information about publications from six psychology journals (*Developmental Psychology*, *Journal of Personality and Social Psychology*, *Journal of Abnormal Psychology*, *Journal of Family Psychology*, *Health Psychology*, and *Journal of Educational Psychology*), for years 1987 to 2023.
These journals were initially selected based on Arnett and colleagues' papers. The dashboard now includes many more behavioural science journals, which were selected by soliciting input from colleagues to broadly represent the interdisciplinary field of behavioral science and the fields that contribute to it (i.e., economics and psychology). You can see the full list of journals on the [Methods] tab. If you think a journal should be there and it's not, please open a [Github issue](https://github.com/rempsyc/busara_dashboard/issues/) and we'll add it.
---
*This dashboard was created with the `pubmedDashboard` package in R: https://rempsyc.github.io/pubmedDashboard/.*
### **About the Authors**
**[Rémi Thériault](https://remi-theriault.com/)** is currently a PhD candidate in Psychology at the Université du Québec à Montréal, Canada. Overall, Rémi is passionate about putting social-psychological research to use to increase people’s well-being and intrinsic motivation to help one another. He also has a (tiny bit obsessive) passion for programming with R.
**[Patrick Forscher](https://busaracenter.org/patrick-s-forscher/)** is the primary collaborator on this project. He is currently the Director of the Culture, Research Ethics, and MEthods (CREME) Meta-Research Team at the Busara Center for Behavioral Economics. The dashboard was Patrick's original vision, and frequently benefits from his creative input.
**[Busara](https://www.busara.global/)** is the dashboard's sponsor. Busara works with researchers and organizations to advance and apply behavioral science in pursuit of poverty alleviation. They use behavioral science to design solutions for partner organisations that are working to make lives better in the Global South.
```{r instructions_reminder_table}
instructions_figure <- "
### Instructions for Figures\n
####\n
The scatter plots use the [`plotly`](https://plotly.com/r/) package, which converts regular images into interactive web graphics via the open source JavaScript graphing library plotly.js.
Hovering over a plotly figure opens the [Chart Studio Modebar](https://plotly.com/chart-studio-help/getting-to-know-the-plotly-modebar/) on the top-right. It includes: (a) Download Plot as a PNG; (b) Zoom and Pan Buttons; (c) Zoom In/Out; (d) Autoscale and Reset Axes; and (e) Hover Options.\n
{width=100%}
**Interacting With Data Points and Legend Groupings**\n
1. When hovering over individual data points, you can see the raw data
2. When hovering over the regression line, you can see the predicted data
3. Click, hold, and drag to zoom into a specific window of the chart. Double-click anywhere to come back to the original zoom level
4. Double-clicking on a group or line will isolate it. Double-clicking again will discontinue the isolation.
5. Single-clicking one or more groups or lines will remove them. Single-clicking again will bring them back.
6. When the legend is too long, use the scroll bar to see the rest of it.\n
"
instructions_table <- "
> \\* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.\n
### Instructions for Tables\n
####\n
All tables use the [DT](https://rstudio.github.io/DT/) package, which converts regular dataframes into interactive datatable HTML table widgets via the open source JavaScript library DataTables.\n
Here are some tips to make the most of these tables.\n
1. You can click on the top left to change the number of entries to show
2. You can change page by clicking on the bottom right area
3. You can sort by column by clicking on the column of your choice
4. You can search for specific values (for example journals or countries) by using the search bar at the top-right\n
{width=100%}
"
```
# Continent {data-navmenu="Economics \\ Psychology"}
## Row 1 {data-height=700}
### Waffle plot of journal paper percentages, by continent (each square = 1% of data) {data-width=1460}
```{r}
# citation1 <- expression(atop(paste("Thériault, R., & Forscher, P. (2024).", italic("The Missing Majority Dashboard.")), "https://remi-theriault.com/dashboards/missing_majority"))
#
# ggplot2::qplot(mtcars$mpg) +
# ggplot2::labs(caption = citation1)
citation <- "Thériault, R., & Forscher, P. (2024). _The Missing Majority Dashboard_. <br>https\\://remi-theriault.com/dashboards/missing_majority"
ggplot2::qplot(mtcars$mpg) +
ggplot2::xlab(citation) +
ggplot2::theme(axis.title.x = ggtext::element_markdown(hjust = 1, size = 20))
```
```{r, results="asis"}
cat(instructions_figure)
```
## Row 2 {data-height=700}
### Table of journal paper percentages, by continent {data-width=1000}
```{r}
knitr::kable(mtcars)
```
```{r, results="asis"}
cat(instructions_table)
```
# Attempt 2 {data-navmenu="Economics \\ Psychology"}
# Methods
## Row 1 {data-height=700}
### **METHOD & DATA**
The dashboard includes information about the articles (e.g., title, abstract) as well as on the authors, such as university of affiliation. I have obtained these data from PubMed using the PubMed API through the `easyPubMed` package. I have determined the country of the first author of each paper based on the affiliation address by matching the university name with a world university names database obtained from GitHub.
The full current list of journals queried can be obtained through the following R code `pubmedDashboard::journal_field$journal_short` (note that PLOS One was only included until 2011 because its number of papers then becomes too large to handle for this dashboard):
On the right, we can see the list of journals included in this dashboard following the above query. The table also includes the number of publication for each journal; this is our sample size of sort. These data helps us understand in part the quality of the PubMed data, to the extent that some journals have poor PubMed indexation or no indexation at all. For example, the journals below had no PubMed matches at all:
```{r, journal_match, eval=read_data}
missing_journals <- detect_missing_journals(data)
missing_journals %>%
filter(found == FALSE)
```
### **JOURNAL COUNT**
```{r, journal_count, eval=read_data}
table_journal_count(data = data)
```
# Limitations {data-navmenu="Limitations"}
## Row 1 {data-height=700}
### **LIMITATIONS**
This dashboard should be considered with the following limitations in mind:
**1. The dashboard data are not per capita (yet)**
This could bias our estimations since we do not look at relative ratios.
**2. The way in which we define which journals represent behavioural science as a whole may be problematic**
Authors from non-English-speaking countries (including in the Global South, and especially Latin America) are more likely to publish in languages other than English (like Spanish or French journals), which are less well-known. Saying that only English-speaking journals truly represent "Behavioural Science" as a whole is somewhat problematic. Would that mean that all the Spanish and French behavioural science journals do not they represent behavioural science as well, even though English monolinguals cannot read them?
**3. Some lesser-known journals are not indexed on PubMed**
Smaller local journals that are more likely to include authors from the Global South (e.g., some African journals) are sometimes not indexed on PubMed. This can can contribute to misrepresenting the proportion of first-authors by region. At the same time, journal visibility does matter on the global stage.
**4. The dashboard has lots of missing data, which is probably not random**
Universities from the Global South and non-English speaking countries are less likely to be correctly detected, for example because of special characters in names or less well-known universities not included in our database. As a result, the country for these publications is more likely to be marked as missing and therefore not be included in the dashboard. This can further under-represent the Global South in our data.
More specifically, some of the papers were missing address information; in many cases, the PubMed API provided only the department and no university. It was not possible to identify the country in these cases (one would need to look at the actual papers one by one to make manual corrections). Furthermore, some university names from the data did not match the university name database obtained from GitHub. In some cases, I have brought manual corrections to university names in an attempt to reduce the number of missing values. A table of data with missing countries is accessible at the [Missing Data] tab.
### **NEXT STEPS**
Possible future steps include: (a) obtaining a better, more current university name database (that includes country of university), (b) making manual corrections for other research institutes not included in the university database, (c) host DT tables on a server to speed up the website and allow the inclusion of a DT table for exploring the raw data, (d) find a way to use country flags for the countries-by-journal figure, and (e) use per-capita data to make it more representative.
# Missing Data {data-navmenu="Limitations"}
## Row 1 {.tabset .tabset-fade}
### This table allows investigating why the country/university could not be identified {data-height=700}
### Important Note